Skip to content

V2 Validating mousetracking

First mouse-tracking experiment with transitions and rewards revealed on hover. Simple behavioral analyses suggest that the hovers provide some signal about planning, but not in exactly the way we expected.

Methods

full experiment demo

  • take N steps in an undirected graph to maximize reward
  • reward and transition functions change every trial
  • hover over a state to reveal both its reward and connected states

Procedure

  • introduce graph and how to move
  • collect each of the four possible rewards {-10, -5, 5, 10} once
  • ternary choice trials to check image/reward learning
    • each block has two trials where each possible reward (besides the very worst) should be chosen (six trials total)
    • the choice always include the next best reward, e.g. for a 5-best-trial, the choice set would be {-10, -5, 5} or {-10, 5, 5}
    • repeat with randomly generated choice sets until perfect performance, or fail out of experiment after 5 blocks
  • introduce multi-step
  • check that they understand they can go back to a previously visited state
  • practice trials with fully revealed graph
  • start varying transition function from trial to trial
  • introduce hidden rewards/transitions and hovering to reveal
  • three full practice rounds
  • 32 main trials, which are analyzed

Results

Example trials

This is showing the 30th trial from each participant, at approximately real-time speed. You can click on the gifs to expand and watch from the beginning. You can view all trials ✨here✨.

We begin by addressing the two main planned analyses from the grant. Then we consider the data from a broader perspective.

N.B.

  • I’m using “fixation” to refer to a period of time when a person is hovering over one of the states (similar for “fixating” a state)
  • P01 is participant 1; P01-T01 is the first (non-practice) trial for participant 1

In the current draft of the grant, we predict: (1A) People’s fixations should be predictive of the choices they make. Does this hold in the mouse hovering data?

tl;dr

Search is certainly predictive of choice, but it’s not clear how it affects choice; the relationship is not as we predicted. The bidirectional causual relationship between consideration and intention complicates things.

Probability of visting a state by interaction of reward and fixation

Intuitively, people’s decisions should be more influenced by rewards associated with fixated (vs. un-fixated) states. However, If we take this literally, it is necessarily true because you can’t possibly visit a state without first hovering over it. Thus, we exclude “fixations” that immediately preceed a click. That is, we predict visiting a state based on whether the state was previously considered. v2-consideration_choice.png

Logistic regression: visited ~ reward * previously_considered

Term Est. S.E. z val. p
reward 2.077 0.450 4.618 p < .001
previously_considered 2.303 0.503 4.578 p < .001
reward:previously_considered -0.352 0.462 -0.763 p = .445
We see a strong main effect of fixation on path choice: people are more likely to visit states that they have looked at previously. However, there is no interaction with reward. You’re more likely to visit a state that you considered previously regardless of its reward.

Interaction of reward and proportion fixation time

The above result could potentially be explained by people’s tendency to fixate every state. What if we use a continuous measure of attention? Looking at the proportion of fixation time, we see an even stronger version of the same thing. States that receive a lot of attention are likely to be visited regardless of their reward.

v2-prop_fixation_choice.png

Logistic regression: visited ~ reward * prop_fixated

Term Est. S.E. z val. p
reward 1.732 0.351 4.939 p < .001
prop_fixated 31.033 2.290 13.549 p < .001
reward:prop_fixated -1.387 2.306 -0.602 p = .547

Possible explanation

What’s going on here? One possible explanation is that people consider courses of action that they intend to take. That is, there is a bidirectional relationship between the current plan and the search process. This prevents us from getting a clean measurement of the effect of search on choice. It seems that the tendency to consider paths that you already intend to take is so strong that this washes out the other causal direction (consideration × value → intention).

flowchart LR
Value --> Consideration
Consideration --> Intention
Intention --> Consideration
Value --> Intention
Intention --> Choice

Is search directed towards rewarding states?

tl;dr

Yes. People are less likely to continue searching down a path ending in a negative reward. And when they do switch paths, they tend to switch to higher-value ones.

Our second main prediction is: (1B) People’s fixations should be preferentially directed towards high-reward states. Note that this is a general prediction consistent with both best-first search and pruning.

Probability of continuing a path by reward of last state on path

A simple indicator of reward-directed search is to look at the probability that people continue planning down a path as a function of the rewards revealed so far. Indeed, we see that people are more like to continue a rollout when the last-fixated state is rewarding.

v2-continue_chain.png

Note that we don’t count going back to the previously fixated state as “continuing”, even though it is technically a valid next state (because the graph is undirected). It’s more likely that this reflects going back to the previous node in the decision tree.

Jumping decisions

The previous analysis focuses on the question of whether to continue a rollout. What do they do when they decide not to continue with the current path? Do they preferentially “jump” to states with higher expected value? Unfortunately, we can’t really define expected value in this task without a model (how do you handle unknown transitions?). For now, we assume the transitions are known, but account for which rewards have been seen.

We see that people do show a slight tendency to fixate states with higher value (compared to the average value of other states they could have fixated). It’s a small effect, but it’s statistically significant (t-test vs. 0: \(p < .001\)). Excluding the zero cases, relative value is positive 60% of the time. v2-jumps.png

Basic behavior

For completeness, here we look at some simple performance metrics, broken down by participant.

Relative score

Relative score is defined as (human - avg_random) / (optimal - avg_random). People usually get a perfect score (1.0), suggesting that the task might be too easy. v2-relative_score.png

Number of states fixated

People fixate every state on about half of all trials. This presents a challenge for any analysis focusing on whether or not a state was fixated. v2-n_fixated.png

Total number of fixations

v2-n_fix.png

Fixation durations

v2-duration.png

Next up

  • continuance probability by full path value
    • but how to define full path value?
    • maybe the full expected value given all revealed information?
      • that’s actually really hard to determine because the transition function is not known!
      • I guess you could integrate across uncertainty in the transitions?
  • backwards vs forward
    • undirected really kills us here
    • people do tend to fixate the starting position first (not discussed above)
  • look for a scanning stage